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A SEQUENTIAL MONTE CARLO APPROACH TO COMPUTING 
TAIL PROBABILITIES IN STOCHASTIC MODELS 

By Hock Peng Chan 1 and Tze Leung Lai 2 
National University of Singapore and Stanford University 

Sequential Monte Carlo methods which involve sequential im- 
portance sampling and resampling are shown to provide a versatile 
approach to computing probabilities of rare events. By making use of 
martingale representations of the sequential Monte Carlo estimators, 
we show how resampling weights can be chosen to yield logarithmi- 
cally efficient Monte Carlo estimates of large deviation probabilities 
for multidimensional Markov random walks. 

1. Introduction. In complex stochastic models, it is often difficult to 
evaluate probabilities of events of interest analytically and Monte Carlo 
methods provide a practical alternative. When an event A occurs with 
a small probability (e.g., 10 -4 ), generating 100 events would require a very 
large number of events (e.g., 1 million) for direct Monte Carlo computation 
of P(A). To circumvent this difficulty one can use importance sampling in- 
stead of direct Monte Carlo changing the measure P to Q under which A is 
no longer a rare event and evaluating P(A) = Eq(LIa) by m~ l Y^iLi ^i^-Ai , 
where {L\, lyii), • • • > (L m , lA m ) are m independent samples drawn from the 
distribution Q, with Lj being a realization of the likelihood ratio statistic 
L := dP/dQ, which is the importance weight. While large deviations theory 
has provided important clues for the choice of Q for Monte Carlo evaluation 
of exceedance probabilities, it has also been demonstrated that importance 
sampling measures that are consistent with large deviations can perform 
much worse than direct Monte Carlo (see Glasserman and Wang [18]). Chan 
and Lai [8] have recently resolved this problem by showing that certain 
mixtures of exponentially twisted measures are asymptotically optimal for 
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importance sampling. For complex stochastic models, however, there are im- 
plementation difficulties in using these asymptotically optimal importance 
sampling measures. Herein we introduce a sequential importance sampling 
and resampling (SISR) procedure to attain a weaker form of asymptotic 
optimality, namely, logarithmic efficiency; the definitions of asymptotic op- 
timality and logarithmic efficiency are given in Section 3. 

Instead of applying directly the asymptotically optimal importance sam- 
pling measure Q that is difficult to sample from, SISR generates m sequen- 
tial samples from a more tractable importance sampling measure Q and 
resamples at every stage t the m sequential sample paths, yielding a mod- 
ified sample path after resampling. The objective is to approximate the 
target measure Q by the weighted empirical measure defined by the resam- 
pling weights. Details are given in Section 2 for general resampling weights 
(not necessarily those associated with the asymptotically optimal resampling 
measure). Section 4 illustrates the SISR method for Monte Carlo computa- 
tion of exceedance probabilities in a variety of applications which include 
boundary crossing probabilities of generalized likelihood ratio statistics and 
tail probabilities of Markov random walks. These applications demonstrate 
the versatility of the SISR method and the relative ease of its implementa- 
tion. 

Our SISR procedure to compute probabilities of rare events is closely 
related to (a) the interacting particle systems (IPS) approach introduced 
by Del Moral and Gamier [14] to compute tail probabilities of the form 
P{V{Xt ) > a} for a possibly nonhomogeneous Markov chain {X{\ and (b) the 
dynamic importance sampling method introduced by Dupuis and Wang 
[16, 17] to compute P{S n /n £ A}, where S n = Ylt=l9(Xt) an d {X n } is 
a uniformly recurrent Markov chain with stationary distribution ir such 
that j g(x)dir(x) ^ A. Both IPS and dynamic importance sampling gen- 
erate the Xi sequentially. Dynamic importance sampling uses an adaptive 
change of measures based on the simulated paths up to each time t < n. 
A recent method closely related to dynamic importance sampling is se- 
quential state-dependent change of measures introduced by Blanchet and 
Glynn [3] for Monte Carlo evaluation of tail probabilities of the maximum 
of heavy-tailed random walks. The IPS approach uses "mutation" to sam- 
ple X^ (conditional on the x[ 1 ' , X^ already generated) from the orig- 
inal measure P and then uses "selection" to draw m i.i.d. samples from 

{{x[^ , . . . , , xf\) :l<i< m} according to a Boltzmann-Gibbs particle 
measure. The theory of IPS in [14] focuses on tail probabilities of V(Xt) for 
fixed t as described in Section 2 rather than large deviation probabilities of 
g(S n /n) for large n as considered in Section 3. Our SISR procedure is mo- 
tivated by rare events of the general form {X n £ T} that involves the entire 
sample path X n = {X\, . . . ,X n ) and includes {V(X n ) > a} and {S n /n E A} 
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considered by Del Moral and Garnier, Dupuis and Wang as special cases. 
The sequential importance sampling component of SISR uses an easily im- 
plementable approximation Q of Q; in many cases it simply uses Q = P. 
Thus, it is quite different from dynamic importance sampling even though 
both yield logarithmically efficient Monte Carlo estimates of P{S n /n E A}. 

2. Sequential importance sampling and resampling (SISR) and martin- 
gale representations. The events in this section are assumed to belong to 
the a- field generated by n random variables Y\ , . . . , Y n on a probability space 
(J7, P) . Let Yt = (Y\ , . . . , Yt) for 1 < t < n. For direct Monte Carlo compu- 
tation of a := P{Y n E T}, i.i.d. random vectors Y^, . . . , Y^ m ^ are generated 
from P and a is estimated by 



The estimate 2d is unbiased and its variance is a(l — a)/m which can be 
consistently estimated by 



In most stochastic models of practical interest, the Yt are either indepen- 
dent or are specified by the conditional densities pt(-\Y t ~i) of Y t given "Yt-li 
with respect to some measure v. Direct Monte Carlo computation of P{Y n E 
r}, therefore, involves Y^ , . . . , Yn that are generated sequentially from the- 
se conditional densities for 1 < i < m. In contrast, SISR first generates m inde- 
pendent random variables Yj , . . . , Y( at stage t, with Y^ having den- 
sity function ^(-IY^j) to form Y^ = (Y^ l5 Y^) and then uses resampling 
weights of the form Wt(y't)/'Y2jLi w t("^'t) ^° draw m independent sam- 
ple paths Yp\ l<j<m, from {Y^, 1 < i < m}. Here qt are conditional 
density functions with respect to v such that qt > whenever pt > 0; one 
particular choice is qt = Pt- In Section 3, we show how the weights wt can 
be chosen to obtain logarithmically efficient SISR estimates of rare event 
probabilities. 

The preceding SISR procedure uses bootstrap resampling that chooses i.i.d. 
sample paths from a weighted empirical measure of {Y^ , 1 < i < m}. It is, 
therefore, similar to the selection step of the IPS approach that chooses i.i.d. 
"path-particles" from some weighted empirical particle measure (see [14]). 
The Monte Carlo estimate of a using SISR with bootstrap resampling is 



m 



(2.1) 




(2.2) 



Sd := «d(1 - ocT>)/m. 



in 



(2.3) 




i=l 
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where ho = l and 

~ k 

Pt{yt\yt-i) t / \ tt w t 



(2.4) 



I — / I \ ) '"re Vv K I / \ ; 



m 
i=l 



Chan and Lai [9] have recently developed a general theory of sequential 
Monte Carlo filters in hidden Markov models by using a representation sim- 
ilar to the right-hand side of (2.3) for these filters. The method of their 
analysis can be applied to analyze m{a-Q — a), decomposing it into a sum 
of (2n — l)m terms so that the summands form a martingale difference se- 
quence. Let E* denote expectation under the probability measure Q from 
which the Y^ and Y t are drawn and define for 1 < t < n, 

(2.5) f t (y t ) = £*[L(Y n )l {Yner} |Y t = y t ] = L(j t )P(Y n e T\Y t = y t ), 
setting fo = a and / n (Y n ) = L(Y n )l|Y ng pj- An important ingredient in the 

(i) (i) 

analysis is the "ancestral origin" a\ of Y^ . Specifically, recall that the "first 
generation" of the m particles consists of , . . . , Y^ m ^ (before resampling) 

(i) (i) ~ (i) (i) 

and set a t = j if the first component of Y^ is Y l . Let # k denote the 

number of copies of generated from { Y^ , . . . , Y^ } to form the m 

particles in the Arfch generation and let = Wk(Y k % ') / YljLi w k0^^ )■ Then 
it follows from (2.4) and simple algebra that for 1 < i < m, 

mw® = h t - 1 {Y®J/h t (?®), 

y / t (Y«)^(Y«)= oPft&htH&P), 

■ (>) ■ ■ (») 

n 



y e [/rtV/MfYa)]^^ 



+E E mli-^w^ft-^^h^iY^ 
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recalling that fo = a, h,Q = l and defining Oq = i. Let 



4ii= E [/*(Y? , )-/^i(Y2i)]**-i(Y2i) f°rl<t< 

(2.6) 



» ;a t-i=J 



4t= E (#r-W l} )[/ t (YW)^(YW)-«] forl<t<n-l. 
• w 

*:<it_l=J 

Then for each fixed j, {s^\ 1 < t < 2n — 1} is a martingale difference se- 
quence with respect to the filtration {J^, 1 < t < 2n — 1} defined below and 

m 

(2.7) m( a B -a)=E( £ ? ) + --- + e 2°n ) -l)- 

The martingale representation (2.7) that involves the ancestral origins of 
the genealogical particles is useful for estimating the standard error of 2b, 
as shown by Chan and Lai [9] who have also introduced the cr-fields 

?2t-i = <t{{Y^ ■ 1 < i < m} 

(2.8) U {(YWy^loW) : 1 < s < t, 1 < i < m}), 
•Fa = <r(F 2 t-x U {(Y t W ,a?) : 1 < i < m}) 

with respect to which (2.6) forms a martingale difference sequence. 

Since / n (Y«) = L(Y^)l { YW er} and ££i(#t° " ™wf) = for 1 < t < 
n — 1, summing (2.6) over t and j yields (2.7). Without tracing their ancestral 
origins, we can also use the successive generations of the m particles to form 
martingale differences directly. Specifically, in analogy with (2.6), define for 
i = 1, . . . ,m, 

Z$-i = Ift&P) - /^i(Y» 1 )]fc t - 1 (Y£ 1 ) for 1 < t < n, 

(2.9) 

m 

Z® = f t (Yf>)h t (Y?) - E^Vt(Yp' ) )/ it (Yp-)) for 1 < t < n - 1. 

As noted by Chan and Lai [9], {(zf } , . . . , Z^ m) ), 1 < t < 2n - 1} is a mar- 
tingale difference sequence with respect to the filtration {J-t, 1 < t < 2n — 1} 
and Zf , . . . , z[ m ^ are conditionally independent given J-t-i', moreover, 



2n-l 

(2.10) m{a B - «) = E ^ + ' ' ' + Z * 
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From the martingale representation (2.10) it follows that E*(c£b) = a. 
Moreover, under the assumption that 



(2.11) ai:=X>* 



i=l 



t-1 



/t 2 (Y t )/n«fc(Y fc ) 



k=l 



E* 



t-i 



lw k (Y k ) 



na < oo, 



application of the central limit theorem yields 

(2.12) y/m(aB - cx) =^ N(0,a%) asm— >-oo. 

A consistent estimate of o" B is given by 



(2.13) 



7 = 1 L . (i) 



n-1 



i+E E (# 



(0 ™„„«> 



QB 



which can be shown to converge to ct b in probability as m — > oo by making 
use of the martingale representation (2.7) (see [9] for details). Del Moral and 
Jacod [15] have derived by a different method a martingale representation 
similar to (2.10) (see [15], (3.3.7) and (3.3.8)), in which the term Z^_ 1 

in (2.9) corresponds to the ith mutation on the ith. particle and Z^f the 
ith selection by the ith particle. In [15], these two terms are combined into 
a sum and a central limit theorem similar to (2.12) is proved under the 
assumption of bounded f n . 

Note that in (2.12) on the asymptotic normality of «b and in the consis- 
tency result ct b A <7 B , the sample size n in the probability a = P{Y n G T} 
is assumed to be fixed whereas the number m of Monte Carlo samples ap- 
proaches oo. The consistent estimate cr B of u B in (2.13) provides an estimate 
^b/v^ °f the standard error (s.e.)(aB) of the Monte Carlo estimate Sb- 
Note that the usual estimate ^Jo.-q{1 — «b) is inconsistent for ^Jm s.e.(aB) 
because of the dependence among the m sample paths due to resampling 
in the SISR procedure as in [13, 14]. The case of n approaching oo will be 
considered in the next section in which the representation (2.6) will still 
play a pivotal role, but which requires new methods and large deviation 
principles rather than central limit theorems. 

Instead of bootstrap resampling, we can use the residual resampling sche- 
me introduced by Baker [1, 2] which often leads to smaller asymptotic vari- 
ance than that of bootstrap resampling. We consider here a variant of this 
scheme introduced by Crisan, Del Moral and Lyons [11] that can result in 
further reduction of the asymptotic variance. Let [ - J denote the greatest 
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integer function and let mt be the sample size at stage t with mi = m. 
We modify the bootstrap resampling step of the SISR procedure as fol- 
lows: let U^ l \ . . . , Ut™^ be independent Bernoulli random variables satisfy- 



ing P{Uf = 1} = mtw\ 11 — [rntwl l> J . For each 1 < i < mt and t < n, make 
#y := \_mtwf^ \ + copies of (Y^\af_ 1 ,hl_ 1 ,w^). These copies con- 
stitute an augmented sample {(Y^ , op, hff\ w[ ) : 1 < j < m t+ i}, where 



,(0 



,(0 



"H+l = ET=i #t° and ^ = ^-i/K^D- Estimate a by 



,(») _ ».(0 



a R := m- 1 £ L(Y»)^« i(Y « j}1 



{Yi l) G r}- 



Define by (2.6) in which m is replaced by m t and define T-it-i (or J-~2t) 
by (2.8) in which m is replaced by m s+ \ (or by mt-|-i). Moreover, define 



Z. 



(0 

2t-l 

?(0 

y 2t 



/t-iCY^JJ/it-iCY^i) for l<t<n, 



r(i) 



- mtto? } )[/ t (Y^)^(Y t w ) - a] for 1 < t < n - 1, 



(0> 



for i = 1, . . . ,mt- Recall that the first generation of particles consists of Y± , 



. . . , Y^ m ^ and that a 



(0 



j if the first component of Y^ is Y^ for j 



l,...,m and i = 1, . . . , mt+i. Analogous to (2.7) and (2.10), we have the 
martingale representations 



(2.14) 



m n ,(S R - a) = J^^i + • • • + eg-i) 



2n-l 



(" l L(fe+l)/2J ) 



). 



fc=l 



Analogous to (2.13), define 



4 



m 



_1 Ei E umK-iiY^) 



■ W 



n-1 



i+E E (#1 



(i) 



m t w t 



From (2.14) it follows that £*[m„(«R - a)] = 0. Let 



Vt = E* 



i[w k (Y k ) 



K(yt) = Vt/ Y[w k (y k ), 
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and let j(x) = (x — \_x\)(l — x + L^J)/^ f° r x > 0. If (2.11) holds, then anal- 
ogous to corresponding results for Sb and <7 B in the bootstrap resampling 
case, we now have as m — > oo, 

cr R — > <7 R , mt/m—tl for every t > 1, 
V^Sr -«)=>• iV(0, cr|), 

where cr^ < er B and 

n 

4 := X>*{[/'( Y *) " / t 2 -i(Y t -i)]^_l(Yt-i)} 



t=i 



*=i 



7 



hU(Yt-i)\[ft(Y t )h* t (Y t )-aY 



h*(Y t ) J h* t (Y t ) 



Details are given in [9]. Note the additional variance reduction if residual 
resampling is used instead of bootstrap resampling. 



3. Logarithmically efficient SISR for Monte Carlo computation of small 
tail probabilities. Let £,£l,£2j- • ■ be i.i.d. (/-dimensional random vectors 
with a common distribution function F such that ip(9) := log(Ee e ^) < oo 

for ||0|| < 6» . Let S n = £i H \- £ n , Mo = E£, ® = {0:^(0) < oo} and let A 

be the closure of V^(Q) and A° be its interior. Assume that for any 6q € 0° 
and 9 G 9 \ 6°, 

lim(0 - e o )'Vi>(0 o + p(0 - O )) = oo. 

Then by convex analysis (see, e.g., [4], Chapter 3), A contains the convex hull 
of the support of {S n /n, n > 1}. The gradient vector VV> is a diffeomorphism 
from 0° onto A°. For given /i E A° let 6^ = (V^) _1 (/z) and define the rate 
function 

(3.1) 0(m) = sup{0'/i - = 0> " ^(^)- 

flee 

We can embed -F in an exponential family {Fg,9 G 0} with 

dFe{x) = e ' x ~ m dF(a:). 

Under certain regularity conditions on g : A — >• R, Chan and Lai [6] have 
developed asymptotic approximations, which involve both g and (j>, to the 
exceedance probabilities 

(3.2) Pn = P{g{S n /n) > b} with b > g(fi ), 

(3.3) Pc = P\ max ng(S n /n)>c\, 

Lno<n<ni J 
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where no ~ poc and n\ ~ p\c such that g(po) < p\ . Making use of these 
approximations, Chan and Lai [8] have shown that certain mixtures of ex- 
ponentially twisted measures are asymptotically optimal for Monte Carlo 
evaluation of (3.2) or (3.3) by importance sampling. Specifically, for A = 
{g(S n /n) > b} in the case of (3.2) or A = {max no < n < ni ng(S n /n) > c} in the 
case of (3.3), an importance sampling measure Q (which may depend on n 
or c) is said to be asymptotically optimal if 



in the case of (3.3), where (L\, Iai), ■ ■ ■ , (L m , lA m ) are m independent real- 
izations of (L := dP/dQ, 1a)- For the case of (3.3), since Eq(LIa) = P(A) = 
p c , Eq(L 2 1a) > Pc by the Cauchy-Schwarz inequality and, therefore, Q 
is an asymptotically optimal importance sampling measure if Eq(L 2 1a) = 
0(p 2 ), which leads to the definition (3.5) of asymptotic optimality for the 
Monte Carlo estimates. Chan and Lai [8] have also shown that \fnp\ is 
an asymptotically minimal order of magnitude for Eq(L 2 1a) in the case 
of (3.2). They have also extended this theory to Markov random walks S n 
whose increments £j have distributions F(-\Xi, Xi_i) depending on a Markov 
chain {X^}. 

The asymptotically optimal mixtures of exponentially twisted measures 
f Po^udj,) dp in [8] involve normalizing constants /3 n (or f3 c ) that may be 
difficult to compute. Moreover, it may even be difficult to sample from the 
twisted measure Pg^, especially in multidimensional and Markovian settings. 
In this section we show that by choosing the resampling weights suitably, 
the SISR estimates Sb can still attain 

(3.6) m Var(aB) = vii e °^ as m — )• oo and n — > oo 
for Monte Carlo estimation of p n and 

(3.7) mVar(S?B) =p 2 e°^ as m — > oo and c— > oo 

for Monte Carlo estimation of (3.3). Moreover, (3.6) and (3.7) still hold 
with q?b replaced by Sr. The properties (3.6) and (3.7) are called loga- 
rithmic efficiency; the variance of the Monte Carlo estimate differs from 
the asymptotically optimal value by a factor of e°( n ' (or e°^) noting that 
— n^ 1 logp„ and — c _1 logp c converge to positive limits. To begin with, sup- 
pose the asymptotically optimal importance sampling measure Q has con- 
ditional densities qt{-\Yt-i) with respect to v. To achieve log efficiency, the 



(3.4) 




as n — > oo 



in the case of (3.2) and if 



(3.5) 
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resampling functions wt can be chosen to satisfy approximately 
(3-8) w t (y t ) oc q t {yt\yt-i)/qt(yt\yt-i) 

as illustrated by the following example, after which a heuristic explanation 
for (3.8) will be given. 

Example 1. Suppose ^1,^2, ■■■ are i.i.d. random variables (d= 1) and 
g(x) = x in (3.2), so that a = p n = P{S n /n > b}, where b > E£i and 29^ £ G. 
Consider the SISR procedure with Q = P (and, therefore, E* = E) and 
resampling weights 

(3.9) w t {Y t ) = e eb ^-^ db) . 
Then L = 1 and hence, by (2.5), 

(3.10) f t (Y t ) = P{S n /n > b\Y t } = P{S n -S t >nb- S t \S t }. 

Therefore, standard Markov's inequality involving moment generating func- 
tions yields 

(3.11) /t(Yt) < e - e b(nb-S t )+(n-t)4>(e b ) _ e e h St-t^{9 h )-n<t>{b) _ 

By (2.6) and the martingale decomposition (2.7), 

n 

E(a B - a) 2 < m-^EmY^) - / t - 1 (YW)] 2 / i f_ 1 (Y«)} 

(3.12) 

n— 1 

+ m" 1 ][>[(#« ~ m W ^f?(Y^)hj(Y^)], 
t=i 

in which the superscript W can be replaced by (*' since the expectations 
are the same for all i. The derivation of (3.12) uses the independence of 

[/t(Y t (i) ) - / t _i(Y t W 1 )]/i t (Y i ( ! ) 1 ) for 1 < i < m when conditioned on T 2t -2 

and the pairwise negative correlations of (#|^ — mw^)ft(Y^)ht(Yf') for 
i = 1, . . . ,m when conditioned on Tit-\ - By (2.4), (3.9) and (3.11), 

£{[/*(Yf } ) - / t - 1 (Y«)] 2 / i ti(Y«)} 
(3.13) = E{w\ ■ ■ ■ wlAftiY?) - /.^(Y^)] 2 ^ 5 -" 2 ^ 1 ^^)} 

\ m J 

To see the inequality in (3.13), condition on T^t-i- Since I?[/j(Y^)| J-^t-i] = 
/t.^Y^), it follows from (3.11) that 

EiiMY^) - ^^(Ya)] 2 ^^-^ 1 ^)! .F 2t -i} 

< ^[/ t 2 (Yf ) )/e 29i " Si( - )l ~ 2( *" 1) ' i/ ' (e6) |^-i] < e -2 "*®^^ 6-2 ^). 
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Moreover, w\ , . . . are i-i-d- random variables with mean 

--l + m^Eie 9 ^-^*) -l) 2 



(3.14) E 



m -iy- (e ^ !) -^)_i ) + i 



i=l 



and their product wf ■ ■ ■ w%_i in the second term of (3.13) is J^t-i-measurable. 
This yields the inequality in (3.13). 

Since the conditional distribution of ^f 1 given T%t—\ is Binomial (m, wf ) , 

E[(#f -mwff^t^mwf . By (2.4), (3.9) and (3.11), f t (Y® )ht(Y®)< 

tDi---u) 4 e- n ^. Since 

= 1, it then follows by conditioning on Tit-\ 

that 

m 

= m" 1 ]>>{(#« ~ mw?ff?{Y?)h*{Yf)} 
i=i 

which can be combined with (3.14) to yield 
(3.15) £?[(#« 

where K = E(e e ^-^ - l) 2 . By (3.12), (3.13) and (3.15), 

1 # 

liminf logfm Var(aB)] > 2^>(6) 

n— >oo n to 

for any fixed m. Since p n j\n~ x l 2 e~ n ^^\ is bounded away from and oo 
(see [8], page 451), (3.6) holds. 

3.1. A heuristic principle for efficient SISR procedures. The asymptoti- 
cally optimal importance sampling measure for p n = P{S n /n >b} '\sQ under 
which £i,£2) • • • are i-i-d. with density function e^ - ^ 6 ) with respect to P 
(see [8]). Since we have used Q = P in Example 1, (3.9) actually follows 
the prescription (3.8) to choose resampling weights that can achieve an ef- 
fect similar to asymptotically optimal importance sampling. We now give 
a heuristic principle underlying this prescription. The SISR procedure uses 
the importance weights p^'V?* (f° r the change of measures from P to Q) 

(i) 

and resampling weights w\ , 1 < i < m, for the m simulated trajectories at 
stage t. The resampling step at stage t basically converts (Y ( f l ,pf > /q^\wf^) 
to (Y^ ,pl /($ w[ ),1), and, therefore, the prescription (3.8) for choosing 
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resampling weights (satisfying q\ w\ = q\ f ) is intended to yield the de- 
sired importance weights pf* /qf^ . To transform this heuristic principle into 
a rigorous proof of logarithmic efficiency, one needs to be able to bound the 
second moments of the importance weights and resampling weights. This 
explains the requirement 26b S 6 in Example 1. 

Example 1 indicates the key role played by the martingale decomposi- 
tion (2.7) and large deviation bounds for P(r n |Yfc), 1 < k < n, in the deriva- 
tion of asymptotically efficient resampling weights. To generalize the basic 
ideas to the more general tail probability (3.2) with nonlinear g, we provide 
large deviation bounds in Lemma 1, whose proof is given in the Appendix, 
for 

(3.16) P{g{{x + S nM )/n)>b}, 

where S Ut k = S n — Sk', note that (3.16) is equal to P{g(S n /n) > b\Sk = x). 
The special case k = and x = has been analyzed by Chan and Lai (see 
Theorem 2 of [6]) under certain regularity conditions that yield precise sad- 
dlepoint approximations. The probability (3.16) is more complicated than 
this special case because it involves additional parameters x and k, but we 
only need large deviation bounds rather than saddlepoint approximations 
for logarithmic efficiency. Let \iq = Vip(6) and define 

(3.17) / = inf{0(/i): 5 (//)>6}, 

(3.18) M = {9:4>( t i e )<I}. 

Lemma 1. Let b > g(fio) . Then as n —> oo , 

(3.19) P{g((x + S n , k )/n)>b}<e- nI+ °^ [ e^-^dfl, 

where the o(n) term is uniform in x and k. 

The proof of (3.19) in the Appendix uses a change-of-measure argument 
that involves the measure Q for which 

(dQ/dP){Y n )= [ e e ' s "- ni ' [e) d6/wo\{M). 
Jm 

The bound (3.19) is used in conjunction with the inequality J M e 9 ' x - k ^( d ) g]Q < 
vol(M) exp{kmax$£M[9'x/k — ip(9)]} to prove the following theorem. 

Theorem 1. Letting b > g{no) , assume: 

(CI) g is twice continuously differentiable and Vg 7^ on N := {/1 € 
A :g(ji) = b}. 

(C2) Ee 2 ^ 1 ^ < 00, where k = sup 9gA/ ||#|| and M is defined in (3.18). 
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Let 9q = and define for 1 <t <n, 

9 t = argmax{9'S t /t - ^(0)}, 

(3.20) 

w t (Y t ) = eMO'tSt - tiP(9 t ) - - (t - l)^(gt-i)]}. 

Wii/i Q = P and the resampling weights thus defined, the SISR estimates 3b 
and Sr are logarithmically efficient, that is, (3.6) holds for Sb and also 
with Sr in place of Sb if m^- oo and n — > oo . 

Besides (3.19), the proof of Theorem 1 also uses the bounds in the fol- 
lowing lemma. These bounds enable us to bound E(wl_ 1 \F 2 (t-i)-2) m the 
proof of Theorem 1. 

Lemma 2. With the same notation and assumptions in Theorem 1, there 
exist nonrandom constants e$ and K > sitc/i t/iai 

lim s t = 0, ^K(Y t )|S*i_i] < e £t and 

(3.21) 

EK 2 (Y t )|5 t _i] < K for all t > 1. 
Proof. Let ^ = sup e6Af |^(f)|. Then 

9[S t - tytflt) = fe-i - (t - lty(0 t )] + [0& - V(^)] 

(3.22) 

< ^_i5 t _i - (t - i)V(0 t -i)] + fe* - ^)] 

and, therefore, it follows from (3.20) that w t {Y t ) < e^&W+i. Hence, by (C2), 

(3.23) £K(Y t )l { | N | >c} |SU] < E[e K ^ +r >l {m>0 ] -+ as C ^ oo. 
It will be shown that for any fixed £ > 0, 

(3.24) 7 t) £:=esssup||0 t -0 t _i||l{||£ t ||<£}->O as t -> oo. 

Let ?? = sup 06M ||V^(0)||. Combining (3.24) with (3.20) and (3.22) yields 

EH(Y t )l { || €t ||< c} |S t -i] < S^-^^l^i^,,^!^-!] 

(3.25) < e^e^Ete^-i^-^^-^ISt-i] 

= l + o(l) 

as i — 7- oo. Moreover, by (C2) and (3.24), as £ — ?■ oo, 

^(Y,)!^^^!^!] < £[e 2 *ll+ 2 n {||a||>f} ] -+0 

(3.26) i?K 2 (Yt)l { || 5t ||<c}l^-i] < e 2 ^.<^E[^-i€*-^^- 1 )|5 t _ 1 ] 

<supe^- 2 ^+ (l), 

and (3.21) follows from (3.23), (3.25) and (3.26). 
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To prove (3.24), let f x ,t(6) = Q'x — tip(8) and let 8 X)t be the unique maxi- 
mizer of f x> t(8) over M. Let A m i n (-) denote the smallest eigenvalue of a sym- 
metric matrix. Since V 2 ip(8) is continuous and positive definite for all 9 G M, 
and since M is compact and A m i n is a continuous function of the entries of 
VV(0), inffleAf AnuntVVW) > 2/3 for some /? > 0. Therefore, by Taylor's 
theorem, f x , t -i(8) < f x ,t-i(8 x , t -i) - Pt\\0 x>t -i - 8\\ 2 for all 6 G M . It then 
follows that for ||y — x\\ < £, 

fy,t%,t-l) < fy,t{0y,t) = fx,t-lWy,t) + 0'y,t{y ~ x) - Wy.t) 

< f x ,t-i(o x , t -i) - pt\\6 x , t -i - M 2 + e' y t { y - x) - rp(e y>t ) 



< f y ,t(9 x ,t-i) - /3t\%,t-i - y ,t\\ 2 + (C + rf)\%,t-i 



7 y,t\ 



and, therefore, — 6y,t\\ < (( + V) / {fit) ■ Hence, (3.24) holds by setting 

x = St-i and y = St- □ 

Proof of Theorem 1. To simplify the notation, we will suppress the 
superscript W in below. By (2.4) and (3.20), 



(3.27) ^_i(Y«) = ( J] wA exph^W + (t - l)V(^-i)]- 



\k=l 



Making use of £[/(Y t (1) )| J" 2t _ 2 ] = /^(Y^), £(su PeeM ^t-^W) < oo and 
the independence of ?D 2 • • -w 2 _i and £t, we obtain from Lemma 1 and (3.27) 
that 

£{lmy; i} ) - / t „ 1 (Y t ( i ) 1 )] 2 / l ti(Y f ( i ) 1 )} 

(3.28) < £{™ 2 • ■-wlJi(YU)/eMM't-iSl 1 J 1 - 2(t - l)V&-i)]} 

< e -2nX+ (n)^2... -2_ i} 

By (2.4) and Lemma 2, 

/ m \ 2 

^ t 2 -il^2( t -i)-2) = m-^EK-^YgjI^ 



J t-2J 



+ m" 2 X;VarK_ 1 (Y2 1 )|5;!. ) 2 ] 



r(i) \ | cW 

V CXI [07^_1 ^ X 
8=1 

< (I + Km-^e 26 *- 1 
and proceeding inductively yields 

(3.29) E{w\---w 2 „ l ) < (l + Km" 1 )*- 1 exp^2e^ < e *(*-i)/™+°(») . 
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Similarly, under bootstrap or residual resampling, 
^[(#«-^f)) 2 /t 2 (Y«)^(YW)] 

m 

(3.30) =m- 1 ^E[(#f -mwfff^Y^Y?)] 

i=i 

<e~ 2nI+0 ^E(wl--w 2 t ). 

By (CI), p n = e - n/ +°( n ) (see [6], Theorem 2) and hence, it follows from (3.12) 
and (3.28)-(3.30) that both q-r and Sb are logarithmically efficient. □ 

The heuristic principle described in the paragraph following Example 1 
can also be used to construct logarithmically efficient SISR procedures for 
Monte Carlo evaluation of (3.3) as illustrated in the following example. 

Example 2. Let T c = inf{n:S n > c}. Consider the estimation of p c = 
P{T C < ni} [i.e., with d = 1 and g(x) = x] when /io < and n\ ~ ac for some 
a > l/ijj'(0*), where 9* is the unique positive root of il)(9*) = 0. We shall 
assume 2$* 6 and use the importance measure Q = P and resampling 
weights 

(e e ^ , if t<T c , 

[i, if m > t > t c . 

Let r]{Y TcAni ) = e 9 *^^!-^ . Since r/(Y TcAni ) > l {maXn < ni s n >c}, it follows 
that 

(3.31) f t (Y t ) = p{maxS n > c\Y t \ < E[ V (Y TcAni )\Y t ] = e 9 * {ST ^~ c \ 

Making use of (3.31) in place of (3.11), we obtain that, analogous to (3.13), 
E{[f t (Y^) - /t_i(Y« )] 2 ^_i(Yff 1 )} 

m ) 

where = E(e e *^ 1 — l) 2 and that, analogous to (3.15), 
(3.33) EK^-m^^V^Yf^^YW^offl + ^V^e-^ 



(3.32) 



m 

Hence, by (3.12) (with n\ in place of n), (3.32) and (3.33), 

mVar(S?B) = 0(n\ exp[(niK*/m) — 20*c\). 

Since ri\ = 0(c) and p c /e * c is bounded away from and oo, as shown 
in [22], (3.7) also holds. 
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In Theorem 2, we provide the resampling weights for logarithmically effi- 
cient simulation of (3.3), for which the counterparts of (3.17) and (3.18) are 
also provided. The basic idea is to use the resampling weights (3.20) up to 
the stopping time 

(3.34) T c = inf{n > hq : ng(S n /n) > c} A n\. 

Theorem 2. Letg(fj, ) <a~ l , n = 5c+O(l) andn\ = ac+0(l) as c— >oo 
for some a > 5 > 0. Let I = inf{(j)(p) : g(fi) > 5' 1 } and M = {9 : 0(/i e ) < /}. 

Let Q = P and assume that (C1)-(C2) hold for all a" 1 <b< and that 
(C3) r :=sup fi:gM > a -imm{g(fi),5- 1 }/<t>(n) < oo. 

Let 6q = and define for 1 < t < n\ — 1, 6t = argmaxggjv/ [^'S^/i — ^(9)] and 

(3.35) w t (Y t ) = I ^^H^A-i-^D^O], if t < T c , 

1 1, ifm>t>T c . 
Then ( 3. 7) holds for Sb o,nd with 2b replaced by Sr. if ' m — > oo and c — > oo . 

Proof. Let u= (t - 1) A T c (1) . By (2.4) and (3.35), 

(3.36) ^i(y£i) = ^n^) exp[-®)) ; 5« +u^)\. 
Let /fe = inf {</>(//) : <?(/i) > 6}. By Lemma 1, 

f t (Y^) = P{Tp< ni \Y^} 



(3.37) '* 

ni 




-nI c/n +o(n) / ^SP-tyWM, ift< Tc (1) 

,(1) 



if t > T 6 ^ 
Note that 

• r ,-lr • J • r <Km) ■ r <K/*)\ -1 

mf b ift = mm< mf , mf > = r 

by (C3). Hence, by (3.36) and (3.37), 

(3.38) £{[.MY t (1) ) - /t-i(Y« jf^CY^)} < e-^W^^ • . -tB^)- 
Similarly, it can be shown that under either bootstrap or residual resampling, 

(3.39) £[(#?> - m^) 2 /t 2 (Yf ^(Y«)] < e" 2 ^^^ • ••<,). 

By (CI) and Theorem 2 of [6], p c = e ~ c / r +°( c ) a nd hence, it follows from 
(3.29), (3.38) and (3.39) that both and S?b are logarithmically efficient. 

□ 
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3.2. Markovian extensions. Let {(Xt,St) ■ t = 0, 1, . . . , } be a Markov ad- 
ditive process on^x R d with transition kernel 

P(x, AxB):=P{(X 1 ,S 1 )eAx(B + s)\(X ,S ) = (x,s)} 

= P{(X 1 ,S 1 ) eAx B\{X ,S ) = (x,0)}. 

Let {X n } be aperiodic and irreducible with respect to some maximal ir- 
reducibility measure (p and assume that the transition kernel satisfies the 
minorization condition 

(3.40) P(x, AxB)> h(x, B)u(A) 

for any measurable set Ac X, Borel set B C R d and s S R d for some prob- 
ability measure v and measure h(x,-) that is positive for all x belonging 
to a (^-positive set. Ney and Nummelin [19] developed a theory to an- 
alyze large deviations properties of S n under (3.40) or when its variant 
P(x,A x B) > h(x)v{A x B) holds. Let r be the first regeneration time and 
assume that n := {(0, () : E v e e ' s -~< < oo} is an open neighborhood of 0. 
Then for all 9 € := {0 : (9, Q € Q for some £}, the kernel 

(3.41) Pe(x,A) := J e e ' s P(x,A x ds) 

has a unique maximum eigenvalue e^ e \ for which C = ip(d) is the unique 
solution of the equation E u e 9 St ~ t ^ = 1, with corresponding right eigenfunc- 
tions r(-;9) and left eigenmeasures £ v (9,-) defined by 

r(x;9) = E x e e ' s ^ e \ 

(3.42) U9-A) = E x (j2 e9 ' Sn ~ nmi {x n eA}) , 

\n=0 / 

L(9;A) = J t x {6;A)dv(x). 

Let 7r denote the stationary distribution of {^ n } and let 

(3.43) M = (V</>rV)- 

To begin with, consider the special case d = 1 and g(x) = x for which the 
importance sampling measure with transition kernel 

(3.44) P e (x,dy x ds) := e e ' s - m {r(y;9)/r(x;9)}P(x,dy x ds) 

has been shown to be logarithmically efficient by Dupuis and Wang [16] and 
asymptotically optimal by Chan and Lai [8] for simulating the tail probabil- 
ity P XQ {S n /n > b} when 9 is chosen to be 9b in (3.44). We shall show that 
by using SISR with Q = P and resampling weights wt(Yt) = e db ^ t ~'^^ b \ we 
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can avoid computation of the eigenfunctions. To bring out the essence of 
the method, we first assume instead of the minorization condition (3.40) the 
stronger uniform recurrence condition 

(3.45) a u(A x B) < P(x, A x B) < a x v(A x B) 

for some < clq < a\ and probability measure v and for all x E X , measurable 
sets A C X and Borcl sets B C R. At the end of this section, we show how 
this assumption can be removed. Note that Y t consists of i < t, in 

the Markov case. 

Example 3. Let b > E^i and assume that 9 b e& and E v {e 2Qh ^ 2 ^ 9 ^) 
< oo. We now extend Example 1 to Markov additive processes by showing 
that the choice Q = P and 

(3.46) w t (Y t ) = e^'-^o) 

results in logarithmically efficient simulation of P XQ {S n /n > b}. The depen- 

dence of the weights w\ and w\ for i ^ j, created from a combination 
of the Markovian structure of the underlying process and bootstrap resam- 
pling, requires a more delicate peeling and induction argument than that 
in Example 1. By considering & — ip^^/Ob instead of £t, we may assume 
without loss of generality that ip(@b) = 0. 

Let k = sup xeX r(x]6i,)/ ini xe x r(x;6b) and let Eg be expectation with 
respect to Pg. Then by (2.5) and (3.44), 

f t (Y t ) = P X0 {S n /n > b\Y t } = P{S n -S t >nb- S t \X t , S t } 

= r(X i; ^)^Je^( 5 "- 5 *)l {5n „ 5i > nfe „ 5t} /r(X n ;^)|^,5 4 ] 
< Ke ~s b (nb-s t )_ 

We shall show that 

(3.47) E(w 2 ■ • ■ w 2 ) = as m — > oo uniformly over 1 < t < n — 1. 

Then logarithmic efficiency of bootstrap resampling follows from (3.12)- 
(3.15). We first show that for any k < t and i ^ j, 

E{tB 2 (E x(l) e e ^)(E xU) e ebS ^)\T 2k ^ 

(3.48) 

<m -2 V(£J y(tl) e^-t+^E^ e 9 ^-^ 1 ) + m' 1 f3, 

where /3 = sup h > x& x E x {e 29b ^(E Xl e 9bSh ) 2 }, which is finite by (3.45). Note 

that u>k is measurable with respect to T2k~\ and that under bootstrap re- 

(i) (i) 

sampling, X k and XY are independent conditioned on T^k-i- Moreover, 



SEQUENTIAL MONTE CARLO FOR TAIL PROBABILITIES 19 
since X™ =xf with probability wf = w k (Y^ )/X)JLi w k (Y^), 

m 



E{w k (E ( i,e 9fcSl - t )|J 2 n}=%Ti«[ u) %«)^ Si - i , 

k k 

U=l 

which is equal to to -1 X^uLi ^ b ^ k E~( U )e 9bSt ~ k in view of (3.46) and that 
i/>(0 b ) = 0. Hence, 



k 

m \ ^ 

'(") „ a q 



(3.49) 



- V (e^ E- {u) e e » S t-* ) (e^ £ w e**-* ) 

fe fc 



u=l 

-2 , fl, .<?. , w fl. *M 

m 

Since j^Q ) an d (S i-^I ) are independent conditioned on Tik-2 
for u^v and ^[e**** (-B^ w e*> s *-*)| ^-2] = £ y w e " 5 '-^ 1 , (3.48) follows 

fc fc — 1 

from (3.49). 

We shall show using (3.48) and induction, that 

(3.50) E(wl---w 2 k ) <7 2 (l + m- 1 /3) fc where 7 = sup E x e e " Sh {>l). 

x£X,h>0 

For k = l, 

m 

Ew\ = m~ 2 E Xo e e ^ E Xo e e ^ + m~ 2 ^ E XQ e 2e ^ < 7 2 + m" 1 /? 
i^j i=l 

and indeed (3.50) holds. If (3.50) holds for all k <t, then by repeated ap- 
plication of (3.48), starting from k = t, we obtain 

t-i 

E(wj ■ ■ ■ w 2 ) < (E xo e e » s *) 2 + m- l p ]T E(wj ■■■vg) 

k=0 



<7 2 | l + m' 1 /3^2(1 + m~ 1 f3) k \ 



and (3.50) indeed holds for k = t. Hence, (3.47) is true and logarithmic 
efficiency is attained. 

The peeling argument used to derive (3.48) and (3.50) can also be used 
to extend Theorems 1 and 2, which hold for general g, to the following. 
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Theorem 3. (a) Let M , 9t and wt(Yt) be the same as in Theorem 1. 
Then Theorem 1 still holds when the i.i.d. assumption on £j is replaced 
by the uniform recurrence condition (3.45) on the Markov additive process 
(Xt,St = £]. + ••• + &) and assumption (C2) is generalized to 

(3.51) / e 2K ^v(X,d£) <oo where k = sup ||0||. 

(b) Let M, 0t and wt(Yt) be the same as in Theorem 2. Then Theo- 
rem 2 still holds when the i.i.d. assumption on £j is replaced by the uniform 
recurrence condition (3.45) and assumption (C2) is generalized to (3.51). 

Note that Q = P in Theorem 3. We next show how the uniform recurrence 
assumption (3.45) can be removed, extending the preceding results on the 
logarithmic efficiency of suitably chosen SISR procedures to more general 
Markov additive processes such that for some 9 € Q, < /3 < 1, function 
w.X^r [l,oo) and measurable set C: 

(Ul) sup xeC . u{x) < oo, f x u(x) du(x) < oo, sup xeC £ x (9; C) <oo, J x £ x (0; 
C) dv(x) < oo, 

(U2) E^'^-^uiXx)} < (1 - p)u(x) for x <£ C, 

(U3) a:=awp xeC E x {e e 'b-^u(Xi)}<°°, 

(U4) K x ■.= sup xeX E x {e 2e '^- 2 ^u 2 lx 1 )/u 2 (x)}<oo. 

We illustrate in Section 4, Example 5, how (U1)-(U4) can be checked in 
a concrete example. Condition (Ul) [in which £ x is defined in (3.42)] holds 
when C is bounded and v has support on a compact set. Conditions (U2)- 
(U4) are often called "drift conditions" (see [8]). Although the arguments 
are essentially modifications of the peeling idea in Example 3 by making 
use of (U1)-(U4), they are considerably more complicated than those in the 
uniformly recurrent case. We, therefore, only consider the univariate linear 
case [d = 1, g(y) = y] in the following theorem to indicate the basic ideas 
without getting into the details of these modifications, such as replacing for 
general g the 9^ in (3.52) by sequential estimates 9t, as in (3.20) and (3.35). 

Theorem 4. Let b > E w ^i and assume that (U1)-(U4) hold for 9 = 9 b . 
Let Q = P and 

(3.52) w t {Y t ) = e^-^^(X t )MAVi). 

Then (3.6) holds with p n = P Xo {S n /n > b}, for S?b or 2r, as n — > oo and 
m — > oo. 

PROOF. By considering £ t — ij)(Qb)/Qb instead of £t, we assume without 
loss of generality that ip(9 b ) = 0. By (2.4) and (3.52), 

(3.53) ^-i(Y« x ) = e-^uix^/uiX^X). 
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It will be shown in the Appendix that 

(3.54) K 2 := sup E x {e 9 » Sh u(X h )/u(x)} < oo. 

x£X,h>0 

Note that 

f t (Y t ) = E X0 (l {Sn/n>b} \Y t ) < e-^ nb E X0 (e s ^\Y t ) 

(3.55) 

— — e db{St - nb) ExA^- 1 ) < K 2 e db{St - nb) u(X t ). 

Since ^ [/ t (Y t (1) )|^_ 2 ] = /^(Y^), it follows from (3.53), (3.55) and (U3) 
that 

^ {[/ t (Y t (1) ) - /w(Y« ^^(Y^)} 

(3.56) < Kle-™» h E X0 {{w x • • • w^fe 26 ^ u 2 (x )u 2 (X^) /u^X^)} 

<f3e- 2ne " b E X0 (w 2 1 ---wl 1 ), 

where /3 = KiK$u 2 (xo). 

By (3.53) and (3.55), under either bootstrap or residual resampling, 

^[(^-m^fV/KYf^KYf))] 
(3-57) =m-^E X0 [(#? -mw?) 2 f 2 (Y?)h 2 (YP)] 

i=l 

<KiE X0 (wl--w 2 )e- 2n6 >> b u 2 (x ). 

In view of (3.12), it now remains to show (3.47). It follows from the proof 
of (3.48) that for any k < t and % ^ j, 



E x < w k 



E y(i) [e d » s ^u(X t _ k )} \ ,E xij) [e e ^u{X t _ k )} 



,E X ^ [e e ^-^u{X t . k+l )\ \/E x m [e e » s t-^u(X t ^ k+1 )} , 

An argument similar to that in (3.48) and (3.50) can be used to show that 

E X0 (w 2 1 ---w 2 )<K 2 (l + m- 1 /3) k . 
Hence, (3.47) again holds and (3.6) follows from (3.56) and (3.57). □ 

3.3. Implementation, estimation of standard errors and discussion. As 
explained in the first paragraph of Section 3.1, at every stage t, the SISR pro- 
cedure carries out importance sampling sequentially within each simulated 
trajectory but performs resampling across the m trajectories. Since the com- 
putation time for resampling increases with m, it is more efficient to divide 
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the m trajectories into r subgroups of size k so that m = kr and resampling 
is performed within each subgroup of k trajectories, independently of the 
other subgroups. This method also has the advantage of providing a direct 
estimate of the standard error of the Monte Carlo estimate a := r _1 Y^,l=i **i> 
where Sj denotes the SISR estimate of a (using either bootstrap or residual 
resampling) based on the ith subgroup of simulated trajectories. Specifically, 
we can estimate the standard error of a by a/y/r, where 

r 

(3.58) o 3 = (r-l)- 1 ^(a i -a) 2 . 

i=l 

In Section 2 we considered the case of fixed n as m — > oo and provided 
estimates of the standard errors of the asymptotically normal 5?b and Sr. 
The validity of these estimates is unclear for the case n — > oo and m — > oo 
as considered in this section that involves large deviations theory instead 
of central limit theorems. By choosing m = kr with k — > oo and r — > oo 
in (3.58), we still have a consistent estimate c/\/r of the standard error in 
the large deviations setting with n — > oo. 

The resampling weights in Theorems 1 and 2 have closed-form expres- 
sions in terms of the cumulant generating function ip(6) in the i.i.d. case 
or the logarithm ip{9) of the largest eigenvalue of the kernel (3.41) in the 
Markov case. When ip(9) does not have an explicit formula, we can use 
numerical approximations and thereby approximate the logarithmically effi- 
cient resampling weights, as will be illustrated in Example 5. This is, there- 
fore, much more flexible than logarithmically efficient importance sampling 
which involves sampling from the efficient importance measure that involves 
both the eigenvalue and corresponding eigenfunction in the Markov case (see 
[5, 8, 10, 16, 21]). Note that approximating the eigenvalue and eigenfunction 
usually does not result in an importance (probability) measure and, there- 
fore, requires an additional task of computing the normalizing constants. 

The basic ideas in Examples 1 and 2 and Sections 3.1 and 3.2 can be 
extended to more general rare events of the form {Xr € T} and more general 
stochastic sequences X t and stopping times T. To evaluate -P{Xt 6 T} by 
Monte Carlo, it would be ideal to sample from the importance measure Q 
for which 

(3.59) ^(x t ) = P{x T er|x t }/p{x T Gr} fort<r, 

because the corresponding Monte Carlo estimate of P{X-t £ T} would have 
variance (see [16], page 2). This is clearly not feasible because the right- 
hand side of (3.59) involves the conditional probabilities -P{X^ € r|Xj} 
and its expectation P{X-t G T} which is an unknown quantity to be de- 
termined. On the other hand, SISR enables one to ignore the normalizing 
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factor P{Xy G T} and to use tractable approximations to P{X.t & r|X(}, 
as in Example 1, in coming up with a logarithmically efficient Monte Carlo 
estimate of -P{Xy G V}. 

4. Illustrative examples. We use the following two examples to illustrate 
Theorems 1 and 4. 

Example 4. Let Xi,X2,... be i.i.d. random variables with EX\ = 0. 

Let & = (X;, X?) and S n = £i~\ \-£ n . Define v) = y/y/v for y G R and 

■U > and note that g{S n /n) is the self-normalized sum of the Xj's. There is 
extensive literature on the large deviation probability p n = P{g(S n /n) > b} 
(see [12]). Consider the case b = l/y/2 and X, having the density function 

with respect to the Lebesgue measure. Thus, X, is a mixture of N(l, 1) and 
X(-l,l). In this case, = {(6>i, 2 ) : 2 < 1/2}, A = :i> > y 2 } and 

log ^ e 1 1+ 2 1 ) = log ( 2 J + 2 - 2^ + log ( — ?T=m — J 

for # G 0. The infimum of the rate function over the one-dimensional man- 
ifold N = {(y,v) :y= ^/vJ2} is I = 0.324 and is attained at (y,v) = (1,2). 
Then M = {9 = {6 1 ,6 2 ) : (f>(y e ,v e ) < 1} [see (3.18) and Theorem 1]. We im- 
plement SISR with bootstrap resampling as described in Section 3.3, with 
m = 10,000 particles, divided into 100 groups each having 100 particles. The 
results, in the form of mean ± standard error and for n = 15, 20 and 25, are 
summarized in Table 1, which also compares them to corresponding results 
obtained by direct Monte Carlo with m = 10,000 in (2.1) and (2.2). Table 1 
shows 18-fold variance reduction by using SISR when n = 15, 25-fold vari- 
ance reduction when n = 20 and that direct Monte Carlo fails when n = 25. 

Example 5. Let £x,C2>--->7i>72>--- be i.i.d. standard normal random 
variables and let 

(4.1) X n+ i = A(X n ) + Cn+l, £n = X n + 7 n , 



Table 1 

Monte Carlo estimates of P{g(S n /n) > l/\/2} 



n 


15 


20 


25 


SISR (1.10 ± 0.07) x 10" 3 


(1.9 ±0.2) x 10~ 4 


(4.0 ±0.7) x 10" 


Direct (0.9 ± 0.3) x 10~ 3 


(1±1) x 10" 4 
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where X(x) is a monotone increasing, piecewise linear function given by 

\(x) = sl{|x|<i} + 1 {^>i} + (~2~) 1 i x <- 1 }- 

Let 9 > 0. We now show that (U1)-(U4) hold for u(x) = e 2A9x+ and C = 
(— oo,p], where p > 1 is chosen large enough so that (U2) holds, as shown 
below. Since (a + b) + < a + 6 + for a > and since e 2mBx < e~°- 05dx u(x), it 
follows that for x > p, 

< u ( x ) e ~ome XE{e9ll ~ m +i.o59 + 2.w(+ } 

and, therefore, (U2) holds if p is large enough. It is easy to check that (U3) 
holds. Note that sup 2 , g( _ oc>il ] E x [e 2e ^~ 2 ^ u 2 {X{)] < oo and that for x>l, 

E x [e^- 2 ^u 2 {X x )]/u 2 {x) 

= E ^ e 2ex+2e 11 -24,{e)+A.29({x+l)/2+C, 1 )+y e i.2ex+ 

< (e-°- Wx A e ^ )E[e 2e 11 -2^e)+2.W.2ect ] as x TO 

and, therefore, (U4) holds. Since lim. r _ s ._ 00 E x (e e ^-^)=0, it follows that 
lim a; _ i ._ 00 ^.(S; C) = 0; moreover, u(x) = 1 for all x < and hence, (Ul) also 
holds. 

We compute Po{S n /n > 2.5} for SISR using resampling, with m = 10,000 
particles divided into 100 groups, each having 100 particles, and with re- 
sampling weights (3.52) for which the following procedure is used to provide 
a numerical approximation for 62.5- First note that by (4.1), 

(4.2) E x e^=e e2 / 2 E X , 



The procedure involves a finite-state Markov chain approximation to (4.1) 
with states Xj and transition probabilities pij (1 < i,j < 1,000) given by 

1,000 

Xl = — - 2.505, Pij = e -(-i-A(^)) 2 /2 / £ g-to-AO*))'/*. 

k=l 

For given 9, it approximates tp(9) by 9 2 /2 + ip(0), where is the largest 
eigenvalue of the matrix (e j Pij)i<ij<i ooo ; m view of (3.41) and (4.2). Since 
^'(^2.5) = 2.5 by (3.43), it uses Brent's method [20] that involves bracketing 
followed by efficient search to find the positive root #2.5 of the equation 
4>(9) + 9 2 /2 = 2.59, noting that ^(0) = 0. The root 9 2 . 5 = 0.273 is then used 
as an approximation to #2.5 m (3.52). Table 2 gives the results, in the form 
of mean± standard error, for the SISR [with several choices of 9 in (3.52), 
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Table 2 

Monte Carlo estimates of Po{S n /n> 2.5} 








n 




15 


20 


25 


SISR 0.1 


(9.68 ±1.37) x 10" 4 


(2.81 ±0.57) x 10" 4 


(4.70 ± 1.22) x 10" 5 


0.2 


(9.65 ±0.75) x 10" 4 


(2.45 ±0.24) x 10" 4 


(6.70 ±0.64) x 10 -5 


0.273 


(8.31 ±0.48) x 10" 4 


(2.42 ±0.19) x 10" 4 


(6.33 ±0.44) x 10" 5 


0.3 


(9.11 ±0.51) x 10~ 4 


(2.54 ±0.20) x 10~ 4 


(5.27 ±0.38) x 10" 5 


0.4 


(9.78 ±0.80) x 10~ 4 


(2.60 ±0.20) x 10~ 4 


(6.58 ±0.67) x 10~ 5 


Direct 


(8 ±3) x 10~ 4 


(3 ±2) x 10~ 4 






including 9 = 62.5] and direct Monte Carlo estimates of Po{S n /n > 2.5}. It 
shows a variance reduction of 35 times for n = 15 and 80 times for n = 20 
over direct Monte Carlo when #2.5 is used as an approximation to #2.5 in the 
resampling weights (3.52) for SISR. When n = 25, direct Monte Carlo fails 
while the SISR estimate still has a reasonably small standard error. 

APPENDIX: PROOF OF (3.19) AND (3.54) 

PROOF OF (3.19). For < e < I, let 

M £ = {0: (j>(fig) =/-£}, H{9) = {/i € A° : 9'{fi - fig) > 0}. 
If fjL € H(0), then 9' ft > 6' fig and, therefore, 

(A.l) <f>(fi) = sup{9'n - ip(9)} > 6' ft - if)(6) > 9'ftg - ip(9) = I - e. 

Moreover, for 9 £ M e , H{9) is a closed half-space whose boundary is the 
tangent space of {ft : 4>{fx) = I — e} at fig. Hence, 

(A.2) 4>{fi)^I-£ for /ie A \ \J H{9). 

8eM e 

Making use of this and (A.l), we next show that 
(A.3) |J H(9) = {ii:<t>(ft)>I-e} 

and, therefore, by (3.17), 

(A.4) T := {ft: g(fi) >b}d{fi: (fi(fi) > I — e} = (J H(9). 

eeM e 

By (A.l), IJeeM £ c {/" : 4>{^) ^ I ~ £ }- Therefore, it suffices for the proof 
of (A.3) to show that {ft : <p{fi) < I — e] D A° \ \J 6eM H{ff). Suppose this is 
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not the case. Then there exists p\ G A° \ IJeeM -^(^) such that 4>(pi) > 
/ - e. Since A° \ UeeA/, 15 : ^(m) < ^ ~ £ }> there exists ^ 2 G A° \ 
U d&M ^ H(9) such that (^(^2) < I — s. By continuity of 0, there exists p G 
(0, 1) such that 4>(ppi + (1 — p)p-2) = I — e. Since A° \ -ff (6>) is a half-space, 
A° \ [JeeMe H (®) = n6»GA/ £ ( A ° \ H{9)) is convex and, therefore, pp\ + (1 - 
p)/^2 G A° \ UeeMe but this contradicts (A. 2), thereby proving (A. 3). 

Define the measure Q by 

j§(Y B ) = / e s ' s ^^de/vo\(M), 

where vol(M) is the volume of M. Let p n = S n /n and h n {6) = 9'p n — ip(9). 
From (A. 4), it follows that if p n G T, then there exists 6* G M £ such that 
@l(Pn — pe f ) > and, therefore, 

(A.5) h n (9*) = 6ip n - 4(9*) > Km, - rP(9*) = 0(/i e J = I - e, 

since 0* G M £ . Let £ n = {9 :{9 -9*)'Vh n {9*) > 0, ||0-0*|| < ra~ 1/2 }. Then for 
all 9 G B n , h n {9) = h n {9*) + {9- 9*)'Vh n (9*) -(9- 0*yv 2 ifj(9*)(0 - 0*)/2 + 
o(\\9 — #*|| 2 ) and, therefore, by (A.5) and the definition of B n , 

h n (9)>I-e-{K + l) /(2ra) for all large n, 

where K = sup 0gA / ||V 2 ^((9)||. Hence, for all large n, 

^(Y n )>l Wer} I e W {nh n (6)}d6/vol(M) 

(A.6) B " 

>l {ft „ e r}(^/ 2 ) en '" n£ " (Ar+1)/2 "- d/2 /vol(M), 

in which a denotes the volume of the ti-dimensional unit ball. Letting e — > 
in (A.6) yields (dQ/dP)(Y n ) > e n/+o(n) l {Aln6 r} in which o{n) is uniform 
in Y n . Hence, 



P{g(S n /n) > b\Y k } = E Q 



^(Y n )l {Sn/ner} ^(Y k ] 



" dP [ kh 



proving (3.19). □ 



To prove (3.54), we use ideas similar to those in the proof of Lemma 1 
of [7] and the following result of [19], page 568. 

Lemma 3. Let r(0) = 0. Under (3.40), there exist regeneration times t(i), 
i > 1, such that: 

(i) r(i + 1) — t{i), i > 0, are i.i.d. random variables. 
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(ii) {^r(i)'---' X r(i+i)-i^T(i)+i,---,CT(i+i)}, i = 0,l,..., are indepen- 
dent blocks, 

(iii) X T u\ has distribution v for all i > 1. 

PROOF of (3.54). Let I x = ^{En=l e^M^n)}, 4 = /4 and 
A = {r(i) : z > 1}. Since u > 1, 

^{e^upTfc)} = ^{e e6Sfc u(X fc )l {T > fc} } 
fe-i 

(A.7) + ^^(e ef ' s n {jej 4 } )^(e e '' 5fe -^(X fc _ i )l {T > fe _, } ) 

3=1 

<4+4[supE x (e^l - eA} ) 
L i>i 

Let < cr = cr(l) < cr(2) < • • • be the hitting times of C. Then 



4<£jxy* 5 «u(x n )j 

(A.8) 

Kk:a(k)<r n=a(k)+l J 

Let y G By (U2), for all n > 2, 

£ y {e^u(*n)l { n<a}} < (1 - /3)S 3/ (e^ s - 1 n(X n _ 1 )l {n _ 1 < (T} ), 
from which it follows by proceeding inductively and applying (U3) that 

(A.9) E y \f^e 9bSn u{X n )\ </r 1 max{a,(l-/3)u(y)}<cm(y), 



k n=l 



where a = (3 1 max{a, (1 — /?)}. Substitution of (A.9) into (A.8) then yields 



't-1 



(A.10) 4 <a|«(x) + ^ ^e e " s "n(X„)l {XneC} J j < au(s) + r?4(^; C), 

where ?7 = sup^g^ «(y). Since tt(x) dv(x) < oo and f x £ x (0b', C) du{x) < oo, 
it follows from (A. 10) that £„ <oo. Combining 



l y&C 



with (A.9) yields 



(A.ll) sup{4(^;C)/«(x)}<oo. 
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Then 



(A. 12) sup-E^e 8 " 5 *! 



{keA}) = supQ*{r(i) 



k for some i} < 1. 



fe>i 



From (A.7), (A. 10), (A.ll) and 

E x (e e » s n {jeA} ) = E x (e e ^l {T=j} ) 



j'-i 



+ ^S x .(e^^l {r=M )^( e l 




/i=i 




fc>i 



{fcG 



A})} 



(3.54) follows from (A. 12). □ 
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